102
Binary Neural Architecture Search
b
B-1
Zero
B-1
Max pooling
Aver pooling
Identity
Dwise-Conv
Dil-Conv
+
+
1-bit
Not Conv
Conv
Input
Output
FIGURE 4.7
The operations of each edge. Each edge has 4 convolutional operations, including 2 types of
binarized convolution with 3 ∗3 or 5 ∗5 receptive fields and 4 non-convolutional operations.
4.3.2
Search Space
We search for computation cells as the building blocks of the final architecture. As in
[305, 306, 151], we construct the network with a predefined number of cells, and each cell
is a fully connected directed acyclic graph (DAG) G with M nodes, {N1, N2, ..., NM}. For
simplicity, we assume that each cell only takes the outputs of the two previous cells as
input and each input node has pre-defined convolutional operations for preprocessing. Each
node Nj is obtained by Nj =
i<j o(i,j)(Ni). Ni is the node dependent on Nj with the
constraints i < j to avoid cycles in a cell. We also define the nodes N−1 and N0 without
input as the first two nodes of a cell. Each node is a specific tensor as a feature map, and
each directed edge (i, j) denotes an operation o(i,j)(.), which is sampled from the following
K = 8 operations:
• no connection (zero)
• skip connection (identity)
• 3 × 3 dilated convolution with rate 2
• 5 × 5 dilated convolution with rate 2
• 3 × 3 max pooling
• 3 × 3 average pooling
• 3 × 3 depth-wise separable convolution
• 5 × 5 depth-wise separable convolution
We replace the separable convolution in depth with a binarized form, as shown in
Fig. 4.7 and 4.8. Optimizing BNNs is more challenging than conventional CNNs [77, 199],
as binarization adds additional burdens to NAS.
Binarized dwise 3*3
channel=M
Dwise 3*3
channel=M
Conv 1*1
channel=N
Binarized conv 1*1
channel=N
Input
channel=M
Input
channel=M
BN+ReLU
BN
BN
FIGURE 4.8
Compared to the origin separable convolution in depth (left), a new binarized separable
convolution in depth is designed for CP-NAS (right).